Data Compression

Views:
 
Category: Education
     
 

Presentation Description

Data Compression for Information Systems

Comments

Presentation Transcript

Data Compression:

Data Compression Joseph Ambrose November 28, 2011 Ohio University MEM Program EMGT 520

Why Data Compression?:

Why Data Compression? Currently Relevant to all aspects of information systems and technology Nearly all forms of media are “coded” Images Video Audio Documents Need for efficient storage and transfer

PowerPoint Presentation:

Objectives Define what Data Compression is Brief History of Data Compression Timeline of Development Understand two main types of compression Methodology of Compression Cost and Availability Benefits and Drawbacks

What is Data Compression?:

What is Data Compression? “Data Compression as it pertains to computer science and information technology is a process of encoding information to use fewer bits than the original representation would use. “ -Wikipedia

What is Data Compression?:

What is Data Compression? Data compression is a possible because: -Redundancy of information -Technology allows for quick encoding and decoding making it worth while to save valuable disk space and bandwidth Original Data Encoder Compressed Data Decoder Decompressed Data

History of Data Compression:

History of Data Compression Dates all the way back to 1838 with Morse Code Used shorter code words for “e” and “t” Modern work began in the late 1940s with development of information theory Claude Shannon and Robert Fano assigned code words based on probabilities of blocks David Huffman in 1951 “Huffman Lossless” discussed later 1970s almost all text codecs used the Huffman method 1980s Terry Welch (LSW algorithm) PKZIP 1990s lossy compression became common due to digital images becoming the norm Lossy compression emerges with more sensory oriented media: images, audio and video.

History of Data Compression:

History of Data Compression Compression methods progressed largely due to available technology and need Image from Usenix : http ://www.usenix.org/

Types of Data Compression:

Types of Data Compression Lossless – Data is compressed and can be decoded back into its original form with no degradation or distortion. Lossy –Compressed data can be decoded and will be similar to its original form, but will have some distortion or loss of data, but should still have the same integrity of the original message.

So what is the difference?:

So what is the difference? Lossless – Compressed then Decompressed data is an exact replication. WinZip General Purpose (HTTP, ASCII, Morse Code) Media (TIFF, PNG, FLAC, MPEG-4 (ALS), GIF) Very few video codecs are lossless Lossy – There is some distortion between original and reproduced. Images (JPEG) Videos (MPEG) Audio (MP3)

Lossless Compression Methods:

Lossless Compression Methods RLE (Run Length Encoding) Huffman Method Using tables Lempel-Ziv Also LZW (Lempel-Ziv Welch) DEFLATE Used in PKZIP, gzip , PNG

Lossless Example:

Lossless Example Huffman Encoding tree: Suppose messages are made of the following 8 symbols A 000 C 010 E 100 G 110 B 001 D 011 F 101 H 111 With this code, the message BACADAEAFABBAAAGAH is encoded as the string of 54 bits 001000010000011000100000101000001001000000000110000111 Example from MIT: http://mitpress.mit.edu/sicp/full-text/sicp/book/node41.html

PowerPoint Presentation:

Huffman Example Cont. Huffman Encoding tree: Suppose messages have a variable length code, and shorter codes given to more frequent characters: Then the message before BACADAEAFABBAAAGAH is now reduced to 42 bits. A 20% savings. 100010100101101100011010100100000111001111 A 0 C 1010 E 1100 G 1110 B 100 D 1011 F 1101 H 1111 Example from MIT: http://mitpress.mit.edu/sicp/full-text/sicp/book/node41.html

Lossy Compression Methods:

Lossy Compression Methods Lossy Predictive Methods - A previous or subsequent decoded data is used to predict the current sample. The error between the predicted and the real data together with any other information is then quantized and coded. Lossy Transform Codecs -Samples are taken, chopped into small segments and transformed into a new basic signature and quantized. The quantized values are then entropy encoded.

Lossy JPEG Example:

Lossy JPEG Example Image from: http://www.irinfo.org/articles/article_4_2006_colbert.html

Cost and Availability:

Cost and Availability Depends on the usage requirements Commercial, personal Some Codecs are Open Source, others are Copyrighted Require Royalties Much media is not standardized on a particular codec, its confusing and constantly changing. Some authors provide encoders and decoders under separate licensing (very common in video codecs)

Pros and Cons:

Pros and Cons Compression is useful for reducing the consumption of costly resources. Because of the decompression process, it is simply not applicable to all applications, and can even be detrimental. Both ends must have the same encoder/decoder

Pros and Cons:

Pros and Cons Compression usage is a tradeoff involving several factors: Space Degree of Compression Extra processing time Amount of distortion

PowerPoint Presentation:

Summary Data Compression Definition Compression is critical part of information technology Consistently growing as new hardware develops Data rich applications demand it Lossy versus Lossless Methodology of Compression Cost and Availability Benefits and Drawbacks

Thank You:

Thank You Please Respond with your questions. I will do my best to help from my research

References:

References Bender, Ryan. " Huffman Encoding Trees." MIT Press . N.p ., 2000. Web. 22 Nov. 2011. <http://mitpress.mit.edu/sicp/full-text/sicp/book/node41.html>. Blelloch , Guy E. Introduction to Data Compression . N.p .: Computer Science Department Carnegie Mellon University, n.d. Print. Colbert, Fred. "Looking Under the Hood: Converting Proprietary Image File Formats Created within IR Cameras for Improved Archival Use ." IR Info . N.p ., 2006. Web. 24 Nov. 2011. <http://www.irinfo.org/articles/article_4_2006_colbert.html>. "Data Compression." Wikipedia . N.p ., n.d. Web. 27 Nov. 2011. <http://en.wikipedia.org/wiki/Data_compression>. Ford, Bryan. "VXA: A Virtual Architecture for Durable Compressed Archives." USENIX . Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology, 2005. Web. 20 Nov. 2011.

References:

References Hanzo , L., and P. J. Streit . Video Compression and Communications . 2nd ed. N.p .: Jogn Wiley and Sons, 2007. Print. Introduction 1. Lelewer , Debra A., and Danial S. Hirshberg . Data Compression . University of California, Irvine, n.d. Web. 20 Nov. 2011. <http://www.ics.uci.edu/~dan/pubs/DataCompression.html>. "Lossless Compression." Maximum Compression . N.p ., 2003. Web. 22 Nov. 2011. <http://www.maximumcompression.com/lossless_vs_lossy.php>. Nelson, Mark. Data Compression . N.p ., 2010. Web. 16 Nov. 2011. <http://datacompression.info>. "Theory." Data Compression . N.p ., 2010. Web. 22 Nov. 2011. <http://www.data-compression.com/theory.shtml#ref>. Wolfram, Stephen. "History of Data Compression." Wolfram Science . N.p ., 2002. Web. 22 Nov. 2011. <http://www.wolframscience.com/reference/notes/1069b