In this 21st century most of us store data on the cloud. The cost of storing the data on the cloud is in terms of number of bits/bytes. Digital storage always comes at a premium, so it’s helpful to know how the files can be compressed into smaller packages.
Reducing the amount of information within a file is known as Compression. Generally, we can split the file compression into two main types: lossless and lossy.
Removing unnecessary bits/bytes of information reduces the file size. This type of compression is known as Lossy compression. In media files such as video, image and different audio formats, the final representation of the data may not be required to be the same as the source representation. MP3 and JPEG are the most common formats which use lossy compression. An example of an audio file where the complete information about the original audio is not represented is MP3 file. Instead, in an MP3 file, some of the unwanted sounds which humans can’t hear is removed. Similarly, in JPEG images non-critical parts of the image are removed. For instance, when we consider a picture with a blue sky, the compression of the JPEG image that takes place is such that only one or two shades are blue, instead of considering dozens of different shades in the image. Lossy compression is not suitable for files where all the information is critical or in other words files originality needs to be maintained.
Fig 1: The images show the reduced sharpness when compared to an original image using lossy technique
When we perfectly reconstruct from an original file by reducing the file size without causing any loss of information is called as Lossless Compression. Essentially, lossless compression removes redundant information which is not required. Contrary to lossy compression, it doesn’t remove unwanted information from the file which is subjected to compression. The lossless compression algorithms basic principle is easy to grasp. Imagine a file made of 6 M’s in a row, which would look like this: “MMMMMM”. You could compress that to take up less space by replacing those six characters with something like “6M”. Formats such as MP3 and JPEG used lossless compression by default.
Fig 2: Images show how the number of blocks can be reduced by assigning a number to each colored block using lossless technique
Generally, you should use lossy compression if you are okay with some information loss, which may not be exactly the same as the source file and lossless compression when you require a perfect reconstruction from the original file without any compromise on information loss. For Windows there is software called 7-Zip which is free and can be used for compressing different file formats and PeaZip is another free software which supports many different file formats for compression.
Authored By:
Amogh Deshmukh,
Assistant Professor,
School of Technology
Woxsen University