Differentiable JPEG Algorithm

Published on: March 28, 2024

Color Space Conversion

Most images are stored in the RGB (Red, Green, and Blue) color space. For JPEG we need to convert it into YCbCr format. In this format, luminance information is stored as a single component (Y) and chrominance information is stored as two color-difference components (Cb and Cr). For conversion we use the following formula:

Chroma Subsampling

Human eyes are more sensitive to brightness than color information which can also be viewed in above figure. This characteristic of the human visual system allows us to modify some of the components without significantly impacting perceived image quality. Y (Luminance) channel preserves the fine details of the original image, while the Cb and Cr (Chrominance) channels contain primarily color information. JPEG downsamples the Cb and Cr channels by a factor of two in each dimension.

Block Splitting

Each channel is divided into 8×8 blocks of 64 pixels. The subsequent steps of the compression algorithm deals with each 8 × 8 block independently.

Discrete Cosine Transformation (DCT)

First, each value must be subtracted with 128 to make the value range from −128 to +127. DCT transforms an 8 × 8 block of pixels into linear combination of 64 patterns which are given by figure below. The DCT yields a weight matrix indicating how much each base image contributes to the formation of the source image.

Quantization

Human eyes are not that well equiped to view the high frequency elements in an image. After chroma subsampling this is the step where information is lost. The weight matrix is divided by a precalculated quantization table. Quantization matrix Q ∈ 8 × 8 is used to create Di,j = ⌊ Ci,j / Qi,j ⌉, ⌊.⌉ represents rounding to the nearest integer. Within the JPEG quantization matrix, higher values are typically concentrated in the bottom right quadrant. Mathematically, these higher Q values in the lower-right region result in a significant number of elements in the DCT (Discrete Cosine Transform) domain (represented by D) becoming zero. Consequently, this effectively removes high-frequency details from the image, such as sharp edges and intricate textures.

Approximate Rounding Function

The non-differentiability of the rounding function used in JPEG quantization impedes the application of gradient-based optimization techniques. To address this challenge, differentiable surrogate functions are used that closely approximate the rounding behavior. Examples of suitable surrogate functions include linear, Fourier, or polynomial functions.

Decoding

For obtaining the RGB image, all the steps are performed in reverse order. Inverse quantization involves element-wise multiplication with the quantization matrix. This process is lossy because original values cannot be recovered perfectly. Inverse discrete cosine transformation reverses the DCT transformation applied during compression. Finally, we perform YCbCr to RGB conversion which is an affine transformation involving matrix multiplication and bias addition.

References