Mantech International

Download as
 PPT
Presentation Description 

No description available

authorSTREAM Premium Service
What's up on authorSTREAM?
Views: 194
Like it  ( Likes) Dislike it  ( Dislikes)
Added: October 14, 2008 This Presentation is Public 
Presentation Category : News & Reports All Rights Reserved
Presentation Transcript

Fuzzy Hashing :Fuzzy Hashing Jesse Kornblum


Overview :Overview A Reasonable Scenario MD5 Explained Where MD5 falls down Similarity Fuzzy Hashing Examples Questions


A Reasonable Scenario :A Reasonable Scenario Disgruntled employee leaves BigCompany Joins new startup in the same field New company introduces similar product Using detailed information from Big Company One particular document Not publicly available BigCompany sues for trade secrets violation New company has eight employees


A Reasonable Scenario :A Reasonable Scenario 100GB per employee hard drive, plus 200GB on server 1TB of data total 536,000 reams of paper About 29 semi-trucks Photo courtesy sumsinnow via Flickr


MD5 Explained :MD5 Explained How MD5 (roughly) works: Start with an initial state Look at fixed size block of input Do mathy stuff with current state and block Get new state Advance to next block of input Repeat steps 2 and 3 until out of input blocks Ending state is the hash


MD5 Explained :MD5 Explained If you change one bit in the middle, you change the next state Which ends up changing the end result Is this a good thing or a bad thing? ?


Similarity :Similarity Different levels of similarity Identical Ones and zeros Displays the same Behaves the same Thematically similar Not similar


Piecewise Hashing :Piecewise Hashing Developed for integrity during imaging Divide input into fixed sized sections and hash separately Insert or delete changes all subsequent hashes 3b152e0baa367a8038373f6df 40c39f174a8756a2c266849b fdb05977978a8bc69ecc46ec


Rolling Hash :Rolling Hash It would be nice to set boundaries such that Insertions and deletions are contained within a block


Disclaimer :Disclaimer I didn’t invent this math Originally Dr. Andrew Tridgell Samba rsync was part of his thesis Modified slightly for spamsum Spam detector in his “junk code” folder User report that rsync confuses similar Word documents


Rolling Hash :Rolling Hash A different kind of hash function Produces a pseudorandom output for every position in a file Depends only on last few bytes Lots of academic work on these Just mathy tricks F o u r s c o r e -> 83,742,221 F o u r s c o r e -> 5 F o u r s c o r e -> 90,281


Rolling Hash :Rolling Hash To update state (c,x,y,z,window) for a byte d: y = y − x y = y + size * d x = x + d x = x − window[c mod size] window[c mod size] = d c = c + 1 z = z << 5 z = z XOR d return (x + y + z)


Rolling Hash :Rolling Hash We use the rolling hash to generate block boundaries Select some values as trigger points When we hit a trigger point, end the block Example Excerpt from "The Raven" by Edgar Allan Poe Triggers on ood and ore


Rolling Hash :Rolling Hash Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever dared to dream before; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word, Lenore?, This I whispered, and an echo murmured back the word, "Lenore!" Merely this, and nothing more


Rolling Hash :Rolling Hash Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever dared to dream before; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word, Lenore?, This I whispered, and an echo murmured back the word, "Lenore!" Merely this, and nothing more


Rolling Hash :Rolling Hash Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word,Lenore ?, This I whispered, and an echo murmured back the word,"Lenore !" Merely this, and nothing more


Rolling Hash :Rolling Hash How do we choose the triggers? Chosen randomly, before reading the file Based on the size of the input file Really just a set of numbers Has nothing to do with type of input data


Fuzzy Hashing :Fuzzy Hashing Combine Rolling Hash with a Traditional Hash Use Fowler/Noll/Vo (FNV) hash That’s what Tridgell did Faster and less complex than MD5 We’re only using a small part of the result Start reading file, compute Rolling and Traditional Hashes When Rolling Hash triggers Record LSB of Traditional Hash value When finished, combine LSBs to make signature


Rolling Hash :Rolling Hash Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word,Lenore ?, This I whispered, and an echo murmured back the word,"Lenore !" Merely this, and nothing more


Rolling Hash :Rolling Hash Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word,Lenore ?, This I whispered, and an echo murmured back the word,"Lenore !" Merely this, and nothing more 28163 491522 57 145410213 738210


Rolling Hash :Rolling Hash Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word,Lenore ?, This I whispered, and an echo murmured back the word,"Lenore !" Merely this, and nothing more 28163 491522 57 145410213 738210


Rolling Hash :Rolling Hash Deep into the darkness peering, long I stood there, wondering, fearing Doubting, dreaming dreams no mortals ever dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word,Lenore ?, This I whispered, and an echo murmured back the word,"Lenore !" Merely this, and nothing more


Rolling Hash :Rolling Hash Deep into the darkness peering, long I stood there, wondering, fearing I AM THE LIZARD KING! Doubting, dreaming dreams no mortals ever dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word,Lenore ?, This I whispered, and an echo murmured back the word,"Lenore !" Merely this, and nothing more


Rolling Hash :Rolling Hash Deep into the darkness peering, long I stood there, wondering, fearing I AM THE LIZARD KING! Doubting, dreaming dreams no mortals ever dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word,Lenore ?, This I whispered, and an echo murmured back the word,"Lenore !" Merely this, and nothing more 28163 82910 57 145410213 738210


Rolling Hash :Rolling Hash Deep into the darkness peering, long I stood there, wondering, fearing I AM THE LIZARD KING! Doubting, dreaming dreams no mortals ever dared to dream before ; But the silence was unbroken, and the stillness gave no token, And the only word there spoken was the whispered word,Lenore ?, This I whispered, and an echo murmured back the word,"Lenore !" Merely this, and nothing more 28163 82910 57 145410213 738210


Matching :Matching Signature 1: 3 2 7 3 0 Signature 2: 3 0 7 3 0 Edit Distance Number of insertions, modifications and deletions to turn Signature 1 into Signature 2. For the example above, the edit distance is one. Signatures (and thus files) match when the ratio of the edit distance to the length is small


Demonstration :Demonstration WARNING: EXPLICIT IMAGERY LAW ENFORCEMENT SENSITIVE DO NOT DUPLICATE


Demonstration :Demonstration LAW ENFORCEMENT SENSITIVE DO NOT DUPLICATE


Demonstration :Demonstration Known kitty porn MATCH Corrupted File


Demonstration :Demonstration Known kitty porn No match Different File


Demonstration :Demonstration Known kitty porn MATCH File Header


Demonstration :Demonstration Known kitty porn MATCH File Footer


Demonstration :Demonstration Known kitty porn MATCH File Footer (attached to header)


Issues :Issues Does not work for similar looking graphics Unable to handle cropping, resizing, and other edits Confused by many small changes throughout input Computationally intensive 7-10 times slower than MD5 No way to sort signatures Must compare each input to all known signatures


Fuzzy Hashing :Fuzzy Hashing Matches similar but not identical bitstreams Great for corrupted or partial documents Also great for source code reuse Free Software http://ssdeep.sf.net/ Windows, GUI, *nix, and OS X Paper in Digital Investigation http://dfrws.org/2006/proceedings/12-Kornblum.pdf


Future Research :Future Research "That algorithm is our last hope." "No, there is another."


Questions :Questions Jesse Kornblum jesse.kornblum@mantech.com