JPEG

Smart Image Compression Algorithm Behind JPEG

311

Computer Science

Tech

Computer science as a field is an extension of Mathematics. Rather there wasn't any field called "computer science" till the 1960s and the jobs related to computer science (anything related to computer programming) were given to mathematicians. And for most of the algorithms, if we know the mathematical functioning of the algorithm we can implement it using a programming language. While most of the algorithms we come across in computer science) require Mathematical and Programming knowledge, the compression algorithm behind JPEG takes advantage of how the human eye and brain work. These optimizations help decrease the actual image size by more than 95%. Let's see how the algorithm behind JPEG works. <b><center><h2>What is Lossy Compression?</b></center></h2> The JPEG algorithm is a lossy compression algorithm. <b>This means that the JPEG algorithm deliberately loses information to decrease the size of the file.</b> Though information loss decreases the quality of the image (as we cannot bring back the original data of the image back), JPEG images look the same as the original image. So what kind of information can we lose in an image? As I said earlier, we require the knowledge of the human visual systems to implement this algorithm. According to results from the research conducted on the human visual system, we got to know that <b>human eyes are more sensitive to brightness than to color</b>. This means that if we change the color of the pixels of an image by a bit, we wouldn't be able to notice that (highly probably) but if we change the brightness of the pixels of the image, the difference will be a lot more noticeable. <b><center><h2>RGB and YCbCr Color Spaces</b></center></h2> One of the most famous color spaces is RGB (Red, Green, Blue) color space. Image data in our computers are saved in this color format. But another important color space especially concerning the JPEG compression algorithm is YCbCr color space. It's not that we can't use an RGB color scheme but YCbCr makes it easier to work with. <b>YCbCr stands for Luma, Chrome Blue, and Chroma Red</b>. <i>Luma basically represents brightness of the image</i>. Hence in an image, if we tamper Cb and Cr values a bit and keep Y values the same, the image must look the same. Hence the first step in JPEG compression is to convert RGB values to YCbCr values. <b><center><h2>Chroma Downsampling and Chroma Subsampling</b></center></h2> This is the step where we deliberately lose some image data for compression. Each pixel of the image has Y, Cb, and Cr values. In <b>chroma subsampling</b>, we start from the top left corner and select a 2 X 2 pixel block. We average the Cb and Cr values of the 4 pixels and change their Cb and Cr values to their average. <b>Note that you cannot change Y values.</b> In <b>chroma downsampling</b>, instead of averaging we replace the Cb and Cr values of all the pixels in the selected 2 X 2 block with the Cb and Cr values of the top left corner of the block. We repeat this step for the whole image. After this step, we will lose 75% of Cb and 75% of Cr values from the original image. <b>This will reduce the image size by 50%.</b> After one step only we have reduced the image size by 50% and the image looks the same as the original image. <b><center><h2>Discrete Cosine Transformations (DCT)</b></center></h2> Discrete Cosine Transformation or DCT is one of the most highly used algorithms in signal processing. <b>It expresses a signal in form of the sum of cosine waves of frequencies with integral multiple</b>. Now we will take a look at images as discrete signals where Y, Cb, and Cr values of the pixels of the image in a row represent the amplitude of the discrete signal at various points. Obviously, we don't know the mathematical form of the signal which represents the row of the pixels of an image. This is where DCT comes in. <b>With the help of DCT transforms we find an equation that CAN represent the discrete signal</b>. But why do we need to represent an image in signal form? As we said we can represent an image in the form of signals. These signals are represented in form of cosine waves and hence they have a frequency as well. If we have an image of size <i>n X n</i> pixel, and if we find DCT along rows and columns we will get another matrix of size <i>n X n</i>. This matrix has DCT values such that the upper left corner of the block corresponds to a lower frequency signal and as we go down and left of the matrix, we go towards a high-frequency signal. Another important feature that researchers found about the human eye was that <b>human eye isn't very sensitive to high-frequency signals</b>. This means that even if we remove the pixels corresponding to high frequencies, the image would look the same. This is why we need to represent images in the signal form. <b><center><h2>2-D DCT in Images</b></center></h2> As the values of Y, Cb and Cr lie between 0 and 256, we subtract all the values with 128 so that we can represent pixels in the form of a sum of cosine waves. We first find the DCT of each row and then the DCT of each column to get a matrix that is of the same size as the image. As a note, we apply this step to Y, Cb, and Cr values separately. As the DCT transformation has been applied to the rows and then to the columns we call it 2-D DCT. Then we reduce the values corresponding to high-frequency signals to 0. Basically, we divide the values in the 2-D DCT matrix with some scaler values and round them off. <b>After this step almost 75% of values in Y, Cb, and Cr are reduced to 0</b>. These scaler values are defined in a <b>quantization table</b>. The best values of the quantization table have been found after a lot of research in the field and can be referenced online. <b><center><h2>Huffman Encoding</b></center></h2> Huffman encoding is an invertible compression algorithm. Instead of writing all the values of Y, Cb, and Cr, it stores how many times a particular value has been repeated. As most of the values have been turned to 0 in the previous step it makes Huffman encoding is a very useful and efficient way of compressing image data. All these steps decrease the image size to about 5% of the original size. <b><center><h2>Image Decoding</b></center></h2> Apart from the steps where we lost the data deliberately, the rest of the steps described above are invertible. This means we will first use Huffman decoding to convert the image data we stored to get a <i>n X n</i> pixel matrix. We will multiply all the values with the values of the quantization table to get a 2-D DCT matrix. We will then use the inverse DCT function to get an image with YCbCr values. We will convert YCbCr pixel values to RGB values to display the image on the screen. Even after losing data, the image will look like the original image.

- Ojas Srivastava, 12:56 AM, 09 Mar, 2022

Image Compression