logging in or signing up symantec Gallard Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 515 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: January 02, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript HUNTING FOR METAMORPHIC ENGINES: HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong September 13, 2006Outline: Outline Metamorphic software Both good and evil uses Metamorphic virus construction kits How effective are metamorphic engines? How to compare two pieces of code? Similarity within and between virus families Similarity to non-viral code Can we detect metamorphic viruses? Commercial virus scanners Hidden Markov models (HMMs) Similarity index Conclusion PART I: PART I Metamorphic SoftwareWhat is Metamorphic Software? : What is Metamorphic Software? Software is metamorphic provided All copies do the same thing Internal structure of copies differs Today almost all software is cloned “Good” metamorphic software… Mitigate buffer overflow attacks “Bad” metamorphic software… Avoid virus/worm signature detectionMetamorphic Software for Good? : Metamorphic Software for Good? Suppose program has a buffer overflow If we clone the program One attack breaks every copy Break once, break everywhere (BOBE) If instead, we have metamorphic copies Each copy still has a buffer overflow One attack does not work against every copy BOBE-resistant Analogous to genetic diversity in biology A little metamorphism does a lot of good!Metamorphic Software for Evil? : Metamorphic Software for Evil? Cloned virus/worm can be detected Common signature on every copy Detect once, detect everywhere (DODE?) If instead virus/worm is metamorphic Each copy has different signature Same detection does not work against every copy Provides DODE-resistance Analogous to genetic diversity in biology But, effective use of metamorphism here is tricky!Crypto Analogy : Crypto Analogy In information security, almost everything that consistently works is either Crypto, or Has a crypto analogy… Consider WWII ciphers German Enigma Broken by Polish and British cryptanalysts Design was (mostly) known to cryptanalysts Japanese Purple Broken by American cryptanalysts Design was (mostly) unknown to cryptanalystsCrypto Analogy : Crypto Analogy Cryptanalysis break a (known) cipher Diagnosis determine how an unknown cipher works (from ciphertext) Which was the greater achievement, breaking Enigma or Purple? Cryptanalysis of Enigma was harder Diagnosis of Purple was harder Can make a reasonable case for either…Crypto Analogy : Crypto Analogy What does this have to do with metamorphic software? Suppose we (the good guys) generate metamorphic copies of our software Bad guys can attack individual copies Can bad guys attack all copies? Bad guys can try to diagnose our metamorphic generatorCrypto Analogy : Crypto Analogy How to diagnose metamorphic generator (from exe’s)? Reverse engineer many copies, look at differences, etc., etc. Lots of work Diagnosis problem is hard If good guys can force bad guys to solve a diagnosis problem, the good guys win Security by obscurity? Violates (spirit of) Kerckhoffs’ Principle? Yes, but still may be valuable in the real worldCrypto Analogy : Crypto Analogy What about case where bad guys write metamorphic code? Metamorphic viruses, for example Do good guys need to solve diagnosis problem? If so, good guys are in trouble Not if good guys “only” need to detect the metamorphic code (not diagnose) Not claiming the good guys job is easy Just claiming that there is hope…Virus Evolution : Virus Evolution Viruses first appeared in the 1980s Fred Cohen Viruses must avoid signature detection Virus can alter its “appearance” Techniques employed encryption polymorphic metamorphic Virus Evolution - Encryption: Virus Evolution - Encryption Virus consists of decrypting module (decryptor) encrypted virus body Different encryption key different virus body signature Weakness decryptor can be detectedVirus Evolution – Polymorphism: Virus Evolution – Polymorphism Try to hide signature of decryptor Can use code emulator to decrypt putative virus dynamically Decrypted virus body is constant Signature detection is possible Virus Evolution – Metamorphism: Virus Evolution – Metamorphism Change virus body Mutation techniques: permutation of subroutines insertion of garbage/jump instructions substitution of instructionsPART II: PART II Virus Construction KitsVirus Construction Kits – PS-MPC: Virus Construction Kits – PS-MPC According to Peter Szor: “… PS-MPC [Phalcon/Skism Mass-Produced Code generator] uses a generator that effectively works as a code-morphing engine…… the viruses that PS-MPC generates are not [only] polymorphic, but their decryption routines and structures change in variants…”Virus Construction Kits – G2: Virus Construction Kits – G2 From the documentation of G2 (Second Generation virus generator): “… different viruses may be generated from identical configuration files…” Virus Construction Kits - NGVCK: Virus Construction Kits - NGVCK From the documentation for NGVCK (Next Generation Virus Creation Kit): “… all created viruses are completely different in structure and opcode…… impossible to catch all variants with one or more scanstrings.…… nearly 100% variability of the entire code” Oh, really?PART III: PART III How Effective Are Metamorphic Engines?How We Compare Two Pieces of Code : How We Compare Two Pieces of Code Virus Families – Test Data: Virus Families – Test Data Four generators, 45 viruses 20 viruses by NGVCK 10 viruses by G2 10 viruses by VCL32 5 viruses by MPCGEN 20 normal utility programs from the Cygwin bin directorySimilarity within Virus Families – Results: Similarity within Virus Families – ResultsSimilarity within Virus Families – Results: Similarity within Virus Families – ResultsSimilarity within Virus Families – Results: Similarity within Virus Families – ResultsSimilarity within Virus Families – Results: Similarity within Virus Families – ResultsSimilarity within Virus Families – Results: Similarity within Virus Families – ResultsNGVCK Similarity to Virus Families: NGVCK Similarity to Virus Families NGVCK versus other viruses 0% similar to G2 and MPCGEN viruses 0 – 5.5% similar to VCL32 viruses (43 out of 100 comparisons have score > 0) 0 – 1.2% similar to normal files (only 8 out of 400 comparisons have score > 0)NGVCK Metamorphism/Similarity: NGVCK Metamorphism/Similarity NGVCK By far the highest degree of metamorphism of any kit tested Virtually no similarity to other viruses or normal programs Undetectable???PART IV : PART IV Can Metamorphic Viruses Be Detected?Commercial Virus Scanners: Commercial Virus Scanners Tested three virus scanners eTrust version 7.0.405 avast! antivirus version 4.7 AVG Anti-Virus version 7.1 Each scanned 37 files 10 NGVCK viruses 10 G2 viruses 10 VCL32 viruses 7 MPCGEN viruses Commercial Virus Scanners: Commercial Virus Scanners Results eTrust and avast! detected 17 (G2 and MPCGEN) AVG detected 27 viruses (G2, MPCGEN and VCL32) none of NGVCK viruses detected by the scanners testedHidden Markov Models (HMMs): Hidden Markov Models (HMMs) state machines transitions between states have fixed probabilities each state has a probability distribution for observing a set of observation symbols can “train” an HMM to represent a set of data (in the form of observation sequences) states = features of the input data transition and the observation probabilities = statistical properties of featuresHMM Example – the Occasionally Dishonest Casino: HMM Example – the Occasionally Dishonest CasinoHMM Example – the Occasionally Dishonest Casino: HMM Example – the Occasionally Dishonest Casino 2 states: fair/loaded The switch between dice is a Markov process Outcomes of a roll have different probabilities in each state If we can only see a sequence of rolls, the state sequence is hidden want to understand the underlying Markov process from the observations HMMs – the Three Problems: HMMs – the Three Problems Find the likelihood of seeing an observation sequence O given a model , i.e. P(O | ) Find an optimal state sequence that could have generated a sequence O Find the model parameters given a sequence O, i.e. find transition and observation probabilities that maximize the probability of observing O There exist efficient algorithms to solve the three problemsHMM Application – Determining the Properties of English Text: HMM Application – Determining the Properties of English Text Given: a large quantity of written English text Input: a long sequence of observations consisting of 27 symbols (the 26 lower-case letters and the word space) Train a model to find the most probable parameters (i.e., solve Problem 3) Use trained model to score any unknown sequence of letters (and spaces) to determine whether it corresponds to English text. (i.e., solve Problem 1) HMM Application – Initial and Final Observation Probability Distributions : HMM Application – Initial and Final Observation Probability Distributions HMM Application - Results : HMM Application - Results Observation probabilities converged, each letter belongs to one of the two hidden states The two states correspond to consonants and vowels Note: no a priori assumption was made HMM effectively recovered the statistically significant feature inherent in EnglishHMM Application - Results: HMM Application - Results Probabilities can be sensibly interpreted for up to n = 12 hidden states Trained model could be used to detect English text, even if the text is “disguised” by, say, a simple substitution cipher or similar transformationVirus Detection with HMMs: Virus Detection with HMMs Use hidden Markov models (HMMs) to represent statistical properties of a set of metamorphic virus variants Train the model on family of metamorphic viruses Use trained model to determine whether a given program is similar to the viruses the HMM representsVirus Detection with HMMs : Virus Detection with HMMs A trained HMM maximizes the probabilities of observing the training sequence assigns high probabilities to sequences similar to the training sequence represents the “average” behavior if trained on multiple sequences represents an entire virus family, as opposed to individual virusesVirus Detection with HMMs – Data: Virus Detection with HMMs – Data Data set 200 NGVCK viruses (160 for training, 40 for testing) Comparison set 40 normal exes from Cygwin 25 other “non-family” viruses (G2, MPCGEN and VCL32) 25 HMM models generated and tested Virus Detection with HMMs – Methodology: Virus Detection with HMMs – MethodologyVirus Detection with HMMs – Results: Virus Detection with HMMs – ResultsVirus Detection with HMMs – Results: Virus Detection with HMMs – Results Detect some other viruses “for free”Virus Detection with HMMs: Virus Detection with HMMs Summary of experimental results All normal programs distinguished VCL32 viruses had scores close to NGVCK family viruses With proper threshold, 17 HMM models had 100% detection rate and 10 models had 0% false positive rate No significant difference in performance between HMMs with 3 or more hidden states Virus Detection with HMMs – Trained Models: Virus Detection with HMMs – Trained Models Converged probabilities in HMM matrices may give insight into the features of the represented viruses We observe opcodes grouped into “hidden” states most opcodes in one state only What does this mean? We are not sure…HMMs – The Trained Models: HMMs – The Trained Models Detection via Similarity Index: Detection via Similarity Index Straightforward similarity index can be used as detector To determine whether a program belongs to the NGVCK virus family, compare it to any randomly chosen NGVCK virus NGVCK similarity to non-NGVCK code is small Can use this fact to detect metamorphic NGVCK variantsDetection via Similarity Index: Detection via Similarity IndexDetection via Similarity Index: Detection via Similarity Index Experiment compare 105 programs to one selected NGVCK virus Results 100% detection, 0% false positive Does not depend on specific NGVCK virus selectedPART V : PART V Conclusion Conclusion: Conclusion Metamorphic generators vary a lot NGVCK has highest metamorphism (10% similarity on average) Other generators far less effective (60% similarity on average) Normal files 35% similar, on average But, NGVCK viruses can be detected! NGVCK viruses too different from other viruses and normal programsConclusion: Conclusion NGVCK viruses not detected by commercial scanners we tested Hidden Markov model (HMM) detects NGVCK (and other) viruses with high accuracy NGVCK viruses also detectable by similarity indexConclusion: Conclusion All metamorphic viruses tested were detectable because High similarity within family and/or Too different from normal programs Effective use of metamorphism by virus/worm requires A high degree of metamorphism and similarity to other programs This is not trivial!Conclusion: Conclusion How practical is our detection method? We “cheat” in several ways Use IDA to disassemble Viruses not embedded in other code Limited testing, small no. of files, etc. But results appear to be robust If so, we can be “sloppy” (i.e., more efficient) and still get good resultsThe Bottom Line: The Bottom Line Metamorphism for “good” For example, buffer overflow mitigation A little metamorphism does a lot of good Metamorphism for “evil” For example, try to evade virus/worm signature detection Requires high degree of metamorphism and similarity to normal programs Not impossible, but not easy…The Bottom Bottom Line: The Bottom Bottom Line For metamorphic software, perhaps the inherent advantage lies with the good guys rather than the bad guys All-too-often in information security, the advantage lies with the bad guysReferences: References X. Gao, Metamorphic software for buffer overflow mitigation, MS thesis, Dept. of CS, SJSU, 2005 P. Szor, The Art of Computer Virus Research and Defense, Addison-Wesley, 2005 M. Stamp, Information Security: Principles and Practice, Wiley InterScience, 2005 W. Wong, Analysis and detection of metamorphic computer viruses, MS thesis, Dept. of CS, SJSU, 2006 W. Wong and M. Stamp, Hunting for metamorphic engines, to appear in Journal in Computer VirologyAppendix : Appendix Extra Materials HMMs – Run Time of Training Process: HMMs – Run Time of Training Process 5 to 38 minutes, depending on number of states n. HMMs – Run Time of Classifying Process: HMMs – Run Time of Classifying Process 0.008 to 0.4 milliseconds, depending on N and number of opcodes T . AVG Anti-Virus Scanning Result: AVG Anti-Virus Scanning Result You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
symantec Gallard Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 515 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: January 02, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript HUNTING FOR METAMORPHIC ENGINES: HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong September 13, 2006Outline: Outline Metamorphic software Both good and evil uses Metamorphic virus construction kits How effective are metamorphic engines? How to compare two pieces of code? Similarity within and between virus families Similarity to non-viral code Can we detect metamorphic viruses? Commercial virus scanners Hidden Markov models (HMMs) Similarity index Conclusion PART I: PART I Metamorphic SoftwareWhat is Metamorphic Software? : What is Metamorphic Software? Software is metamorphic provided All copies do the same thing Internal structure of copies differs Today almost all software is cloned “Good” metamorphic software… Mitigate buffer overflow attacks “Bad” metamorphic software… Avoid virus/worm signature detectionMetamorphic Software for Good? : Metamorphic Software for Good? Suppose program has a buffer overflow If we clone the program One attack breaks every copy Break once, break everywhere (BOBE) If instead, we have metamorphic copies Each copy still has a buffer overflow One attack does not work against every copy BOBE-resistant Analogous to genetic diversity in biology A little metamorphism does a lot of good!Metamorphic Software for Evil? : Metamorphic Software for Evil? Cloned virus/worm can be detected Common signature on every copy Detect once, detect everywhere (DODE?) If instead virus/worm is metamorphic Each copy has different signature Same detection does not work against every copy Provides DODE-resistance Analogous to genetic diversity in biology But, effective use of metamorphism here is tricky!Crypto Analogy : Crypto Analogy In information security, almost everything that consistently works is either Crypto, or Has a crypto analogy… Consider WWII ciphers German Enigma Broken by Polish and British cryptanalysts Design was (mostly) known to cryptanalysts Japanese Purple Broken by American cryptanalysts Design was (mostly) unknown to cryptanalystsCrypto Analogy : Crypto Analogy Cryptanalysis break a (known) cipher Diagnosis determine how an unknown cipher works (from ciphertext) Which was the greater achievement, breaking Enigma or Purple? Cryptanalysis of Enigma was harder Diagnosis of Purple was harder Can make a reasonable case for either…Crypto Analogy : Crypto Analogy What does this have to do with metamorphic software? Suppose we (the good guys) generate metamorphic copies of our software Bad guys can attack individual copies Can bad guys attack all copies? Bad guys can try to diagnose our metamorphic generatorCrypto Analogy : Crypto Analogy How to diagnose metamorphic generator (from exe’s)? Reverse engineer many copies, look at differences, etc., etc. Lots of work Diagnosis problem is hard If good guys can force bad guys to solve a diagnosis problem, the good guys win Security by obscurity? Violates (spirit of) Kerckhoffs’ Principle? Yes, but still may be valuable in the real worldCrypto Analogy : Crypto Analogy What about case where bad guys write metamorphic code? Metamorphic viruses, for example Do good guys need to solve diagnosis problem? If so, good guys are in trouble Not if good guys “only” need to detect the metamorphic code (not diagnose) Not claiming the good guys job is easy Just claiming that there is hope…Virus Evolution : Virus Evolution Viruses first appeared in the 1980s Fred Cohen Viruses must avoid signature detection Virus can alter its “appearance” Techniques employed encryption polymorphic metamorphic Virus Evolution - Encryption: Virus Evolution - Encryption Virus consists of decrypting module (decryptor) encrypted virus body Different encryption key different virus body signature Weakness decryptor can be detectedVirus Evolution – Polymorphism: Virus Evolution – Polymorphism Try to hide signature of decryptor Can use code emulator to decrypt putative virus dynamically Decrypted virus body is constant Signature detection is possible Virus Evolution – Metamorphism: Virus Evolution – Metamorphism Change virus body Mutation techniques: permutation of subroutines insertion of garbage/jump instructions substitution of instructionsPART II: PART II Virus Construction KitsVirus Construction Kits – PS-MPC: Virus Construction Kits – PS-MPC According to Peter Szor: “… PS-MPC [Phalcon/Skism Mass-Produced Code generator] uses a generator that effectively works as a code-morphing engine…… the viruses that PS-MPC generates are not [only] polymorphic, but their decryption routines and structures change in variants…”Virus Construction Kits – G2: Virus Construction Kits – G2 From the documentation of G2 (Second Generation virus generator): “… different viruses may be generated from identical configuration files…” Virus Construction Kits - NGVCK: Virus Construction Kits - NGVCK From the documentation for NGVCK (Next Generation Virus Creation Kit): “… all created viruses are completely different in structure and opcode…… impossible to catch all variants with one or more scanstrings.…… nearly 100% variability of the entire code” Oh, really?PART III: PART III How Effective Are Metamorphic Engines?How We Compare Two Pieces of Code : How We Compare Two Pieces of Code Virus Families – Test Data: Virus Families – Test Data Four generators, 45 viruses 20 viruses by NGVCK 10 viruses by G2 10 viruses by VCL32 5 viruses by MPCGEN 20 normal utility programs from the Cygwin bin directorySimilarity within Virus Families – Results: Similarity within Virus Families – ResultsSimilarity within Virus Families – Results: Similarity within Virus Families – ResultsSimilarity within Virus Families – Results: Similarity within Virus Families – ResultsSimilarity within Virus Families – Results: Similarity within Virus Families – ResultsSimilarity within Virus Families – Results: Similarity within Virus Families – ResultsNGVCK Similarity to Virus Families: NGVCK Similarity to Virus Families NGVCK versus other viruses 0% similar to G2 and MPCGEN viruses 0 – 5.5% similar to VCL32 viruses (43 out of 100 comparisons have score > 0) 0 – 1.2% similar to normal files (only 8 out of 400 comparisons have score > 0)NGVCK Metamorphism/Similarity: NGVCK Metamorphism/Similarity NGVCK By far the highest degree of metamorphism of any kit tested Virtually no similarity to other viruses or normal programs Undetectable???PART IV : PART IV Can Metamorphic Viruses Be Detected?Commercial Virus Scanners: Commercial Virus Scanners Tested three virus scanners eTrust version 7.0.405 avast! antivirus version 4.7 AVG Anti-Virus version 7.1 Each scanned 37 files 10 NGVCK viruses 10 G2 viruses 10 VCL32 viruses 7 MPCGEN viruses Commercial Virus Scanners: Commercial Virus Scanners Results eTrust and avast! detected 17 (G2 and MPCGEN) AVG detected 27 viruses (G2, MPCGEN and VCL32) none of NGVCK viruses detected by the scanners testedHidden Markov Models (HMMs): Hidden Markov Models (HMMs) state machines transitions between states have fixed probabilities each state has a probability distribution for observing a set of observation symbols can “train” an HMM to represent a set of data (in the form of observation sequences) states = features of the input data transition and the observation probabilities = statistical properties of featuresHMM Example – the Occasionally Dishonest Casino: HMM Example – the Occasionally Dishonest CasinoHMM Example – the Occasionally Dishonest Casino: HMM Example – the Occasionally Dishonest Casino 2 states: fair/loaded The switch between dice is a Markov process Outcomes of a roll have different probabilities in each state If we can only see a sequence of rolls, the state sequence is hidden want to understand the underlying Markov process from the observations HMMs – the Three Problems: HMMs – the Three Problems Find the likelihood of seeing an observation sequence O given a model , i.e. P(O | ) Find an optimal state sequence that could have generated a sequence O Find the model parameters given a sequence O, i.e. find transition and observation probabilities that maximize the probability of observing O There exist efficient algorithms to solve the three problemsHMM Application – Determining the Properties of English Text: HMM Application – Determining the Properties of English Text Given: a large quantity of written English text Input: a long sequence of observations consisting of 27 symbols (the 26 lower-case letters and the word space) Train a model to find the most probable parameters (i.e., solve Problem 3) Use trained model to score any unknown sequence of letters (and spaces) to determine whether it corresponds to English text. (i.e., solve Problem 1) HMM Application – Initial and Final Observation Probability Distributions : HMM Application – Initial and Final Observation Probability Distributions HMM Application - Results : HMM Application - Results Observation probabilities converged, each letter belongs to one of the two hidden states The two states correspond to consonants and vowels Note: no a priori assumption was made HMM effectively recovered the statistically significant feature inherent in EnglishHMM Application - Results: HMM Application - Results Probabilities can be sensibly interpreted for up to n = 12 hidden states Trained model could be used to detect English text, even if the text is “disguised” by, say, a simple substitution cipher or similar transformationVirus Detection with HMMs: Virus Detection with HMMs Use hidden Markov models (HMMs) to represent statistical properties of a set of metamorphic virus variants Train the model on family of metamorphic viruses Use trained model to determine whether a given program is similar to the viruses the HMM representsVirus Detection with HMMs : Virus Detection with HMMs A trained HMM maximizes the probabilities of observing the training sequence assigns high probabilities to sequences similar to the training sequence represents the “average” behavior if trained on multiple sequences represents an entire virus family, as opposed to individual virusesVirus Detection with HMMs – Data: Virus Detection with HMMs – Data Data set 200 NGVCK viruses (160 for training, 40 for testing) Comparison set 40 normal exes from Cygwin 25 other “non-family” viruses (G2, MPCGEN and VCL32) 25 HMM models generated and tested Virus Detection with HMMs – Methodology: Virus Detection with HMMs – MethodologyVirus Detection with HMMs – Results: Virus Detection with HMMs – ResultsVirus Detection with HMMs – Results: Virus Detection with HMMs – Results Detect some other viruses “for free”Virus Detection with HMMs: Virus Detection with HMMs Summary of experimental results All normal programs distinguished VCL32 viruses had scores close to NGVCK family viruses With proper threshold, 17 HMM models had 100% detection rate and 10 models had 0% false positive rate No significant difference in performance between HMMs with 3 or more hidden states Virus Detection with HMMs – Trained Models: Virus Detection with HMMs – Trained Models Converged probabilities in HMM matrices may give insight into the features of the represented viruses We observe opcodes grouped into “hidden” states most opcodes in one state only What does this mean? We are not sure…HMMs – The Trained Models: HMMs – The Trained Models Detection via Similarity Index: Detection via Similarity Index Straightforward similarity index can be used as detector To determine whether a program belongs to the NGVCK virus family, compare it to any randomly chosen NGVCK virus NGVCK similarity to non-NGVCK code is small Can use this fact to detect metamorphic NGVCK variantsDetection via Similarity Index: Detection via Similarity IndexDetection via Similarity Index: Detection via Similarity Index Experiment compare 105 programs to one selected NGVCK virus Results 100% detection, 0% false positive Does not depend on specific NGVCK virus selectedPART V : PART V Conclusion Conclusion: Conclusion Metamorphic generators vary a lot NGVCK has highest metamorphism (10% similarity on average) Other generators far less effective (60% similarity on average) Normal files 35% similar, on average But, NGVCK viruses can be detected! NGVCK viruses too different from other viruses and normal programsConclusion: Conclusion NGVCK viruses not detected by commercial scanners we tested Hidden Markov model (HMM) detects NGVCK (and other) viruses with high accuracy NGVCK viruses also detectable by similarity indexConclusion: Conclusion All metamorphic viruses tested were detectable because High similarity within family and/or Too different from normal programs Effective use of metamorphism by virus/worm requires A high degree of metamorphism and similarity to other programs This is not trivial!Conclusion: Conclusion How practical is our detection method? We “cheat” in several ways Use IDA to disassemble Viruses not embedded in other code Limited testing, small no. of files, etc. But results appear to be robust If so, we can be “sloppy” (i.e., more efficient) and still get good resultsThe Bottom Line: The Bottom Line Metamorphism for “good” For example, buffer overflow mitigation A little metamorphism does a lot of good Metamorphism for “evil” For example, try to evade virus/worm signature detection Requires high degree of metamorphism and similarity to normal programs Not impossible, but not easy…The Bottom Bottom Line: The Bottom Bottom Line For metamorphic software, perhaps the inherent advantage lies with the good guys rather than the bad guys All-too-often in information security, the advantage lies with the bad guysReferences: References X. Gao, Metamorphic software for buffer overflow mitigation, MS thesis, Dept. of CS, SJSU, 2005 P. Szor, The Art of Computer Virus Research and Defense, Addison-Wesley, 2005 M. Stamp, Information Security: Principles and Practice, Wiley InterScience, 2005 W. Wong, Analysis and detection of metamorphic computer viruses, MS thesis, Dept. of CS, SJSU, 2006 W. Wong and M. Stamp, Hunting for metamorphic engines, to appear in Journal in Computer VirologyAppendix : Appendix Extra Materials HMMs – Run Time of Training Process: HMMs – Run Time of Training Process 5 to 38 minutes, depending on number of states n. HMMs – Run Time of Classifying Process: HMMs – Run Time of Classifying Process 0.008 to 0.4 milliseconds, depending on N and number of opcodes T . AVG Anti-Virus Scanning Result: AVG Anti-Virus Scanning Result