Thanks to visit codestin.com
Credit goes to github.com

Skip to content

swee7zania/Image-compression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visual Perception-Inspired Image Transmission for Intelligent Mobile Devices Using FMQM

With the rapid popularization of intelligent mobile devices such as drones, smartphones, and dashcams, efficient image transmission has become a critical challenge. High-resolution images generated by these devices often face constraints of limited network bandwidth, storage capacity, and real-time processing requirements. This paper proposes an image compression algorithm that uses perceptual coding and imports a Flexible Modulus Quantization Method (FMQM), inspired by the Five Modulus Method (FMM). The FMQM method simplifies the data by grouping values based on the modulus and it can flexibly configure the module, making the image data more suitable for Run-Length Encoding (RLE) while minimizing visual impact. Following FMQM quantity, Run-Length Encoding (RLE) is applied to exploit redundancies in low-frequency regions, to achieve visually lossless but compression with efficient computing. Finally, from the perspective of human vision, SSIM and other indicators are used to measure Image Quality Assessment (IQA) to ensure that the compressed image quality meets expectations.

Keywords—Flexible Modulus Quantization Method (FMQM), Image compression, Perceptual coding, Real-time transmission, Run-Length Encoding (RLE).

1. Introduction

The proliferation of intelligent mobile devices such as drones, smartphones, and dashcams has significantly increased the demand for real-time image transmission. These devices often get high-resolution images, which provide more visual detail, pose challenges related to storage capacity and network bandwidth. An efficient image compression method to meet constraints on transmission speed, storage, and computational resources while ensuring high-quality image reconstruction is urgently needed.

This paper introduces a flexible image compression algorithm tailored for such resource-constrained environments. The proposed base on perceptual coding, which leverages the characteristics of human visual perception to achieve visually lossless compression. By exploiting the human eye's varying sensitivity to luminance and chrominance, the algorithm selectively preserves critical visual details while compressing less perceptible channels. This ensures that essential image information remains intact while reducing redundancy in less sensitive channels.

At the core of this algorithm is the Flexible Modulus Quantization Method (FMQM) cooperating with Run-Length Encoding (RLE), which extends the traditional Five Modulus Method (FMM) by import dynamic modulus selection. Inspired by prior work highlighting the advantages of the YCbCr color space for concentrating energy into the luminance channel. Unlike FMM, which applies a fixed modulus across all channels, FMQM assigns smaller modulus values to the luminance (Y) channel to retain crucial details and larger modulus values to the chrominance (Cb and Cr) channels to reduce data size. This flexibility allows FMQM to balance compression efficiency and visual quality, adapting to the specific requirements of different application scenarios.

Using Run-Length Encoding (RLE) after FMQM quantization. RLE effectively compresses the redundancy inherent in chrominance channels that regions of identical values are common. Finally, image quality is evaluated using image quality assessment methods such as SSIM and PSNR. This combination of FMQM and RLE not only achieves high compression ratios but also maintains low computational complexity, making it ideal for real-time transmission in intelligent mobile devices like drones or other mobile devices.

The data in this paper are from Rawzor - Lossless compression software for camera raw images (https://imagecompression.info/test_images/).

2. Scenario Description

Assume a practical application scenario: a wildlife protection monitoring system, in which multiple drones are used to capture images of wildlife and transmit them to a central monitoring station in real time. The system has the following constraints:

  • Image resolution requirements: Since wildlife live in a wide area, using larger pictures facilitates better observation of wildlife. Therefore, in the original data set of the project, each image is at least 1920×1080.
  • Network bandwidth limitations: Assume that drones in some areas still transmit data through 4G networks with an average bandwidth of 5 Mbps.
  • Transmission delay requirements: For real-time monitoring, the transmission time of each image must not exceed 2 seconds.
  • Storage space limitations: Drones have limited local storage capacity, so high compression rates allow drones to save more data and keep them working longer.
  • Quality requirements: In order not to affect the observation of animal researchers, we should retain the important information of the image to the greatest extent possible. Therefore, the PSNR (peak signal-to-noise ratio) of the decompressed image at least 30 dB.

Given these constraints, efficient image compression is essential to reduce data size while ensuring visual quality and meeting transmission and storage requirements. Specifically, the compression algorithm must achieve:

  • A compressed bit rate of 2 Mbps or lower.
  • A compression ratio of at least 50%, e.g., reducing the size of a 3 MB image to 1200 KB or less.

3. Detailed Design

3.1 Encoding Pipeline

Step 1. Extract Raw Data. Extracting raw pixel data from images in the PPM format. The PPM format stores image data along with a header containing metadata. The extraction process involves the following steps:

  • The image header is read to extract essential metadata, such as width, height and max color value of the image, etc.
  • Records the offset where binary pixel data begins.
  • Extracting starting from the binary offset and saving as RAW format for further processing.

Step 2. Calculate entropy. Shannon entropy is computed to quantify the average information or uncertainty in the raw image data. This metric provides insight into the redundancy present in the data. Lower entropy means more redundancy, making the data easier to compress. This helps to check how well compression reduces data size.

Step 3. RGB to YCbCr. To align with human visual perception and enhance compression efficiency, the input image is transformed from the RGB color space to the YCbCr color space. The YCbCr model separates luminance (Y) from chrominance (Cb and Cr), allowing the algorithm to apply different compression levels to each component based on their perceptual importance. The transformation process involves the following steps:

  • The raw binary data is read and reshaped into an array of dimensions (height, width, 3), representing the RGB channels of the image.
  • Using the PIL library, the RGB array is converted to the YCbCr color space.
  • The transformed data is split into three independent channels: Y (Luminance) represents brightness and is crucial for preserving visual detail; Cb and Cr (Chrominance) represent color differences and are less sensitive to the human eye.

Step 4. FMQM Quantization. Flexible Modulus Quantization Method (FMQM) is used to reduce the data size of each channel while maintaining visual quality. This method leverages modulus-based quantization to group pixel values into discrete levels. By applying different modulus values to the luminance (Y) and chrominance (Cb and Cr) channels, FMQM balances compression efficiency with perceptual fidelity. The quantization process involves the following steps:

  • Each pixel value in a channel is divided by the modulus, rounded to the nearest integer, and then multiplied by the modulus to approximate the original value.
  • Set the modulus for each channel. Apply a smaller modulus (e.g., m=1) to the Luminance (Y) to retain critical visual details. Also applies a larger modulus (e.g., m=8) to the Chrominance (Cb, Cr) to compress less sensitive color information.

Step 5. Recalculate Entropy. After quantization, the entropy of each channel is recalculated to evaluate the compression potential.

Step 6. Run-Length Encoding. The quantized channels are compressed using RLE, which encodes sequences of repeated values as a pair: the value and its count. The RLE encoding process involves the following steps:

  • Flattens each channel into a 1D array.
  • Iterates through the array, counting consecutive identical values.

Step 7. Encoded Data. Stores the compressed result as a binary file.

3.2 Decoding Pipeline

Step 1. Encoded Data. Read the binary file saved after image compression.

Step 2. Run-Length Decoding. Restore RLE-encoded binary data to original data. The decompression process involves the following steps:

  • Input a list of pairs (value, count), where each pair represents the value and its frequency in the original data and the dimensions of the original data array.
  • An empty array of the same shape as the original image is initialized.
  • Fill the corresponding positions in the array to restore the original values.

Step 3. YCbCr to RGB. The image is reconstructed by converting the modified YCbCr data back to the RGB color space. The transformation process involves the following steps:

  • The quantized Y, Cb, and Cr channels are stacked along the third dimension to form a single array representing the YCbCr image.
  • Using the PIL library, the stacked YCbCr array is converted into an RGB image. The PIL library handles the mathematical operations required to map YCbCr values to their corresponding RGB values based on standard transformation matrices.

Step 4. Output Image. Output the restored RGB image to a common image format, such as PNG.

Step 5. Image Quality Assessment. To evaluate the quality of the compressed images, two metrics were used: Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM). These metrics provide quantitative measures of the visual fidelity of the decompressed image compared to the original

4. Compression Results

4.1 Extract raw data

In the case of cathedral.ppm, I removed the header data and kept only the original binary image data.

image-20250115151448756

4.2 Quantization and Entropy

In the case of nightshot_iso_1600.ppm, I calculated the original entropy and the entropy after FMM quantization.

Raw Entropy Entropy After FMM Quantization
15.46 bits Y channel: 4.94 bits; Cb channel:2.33 bits; Cr channel:1.66 bits

The Y channel modulus is 3, and the Cb and Cr channel modulus is 10. It can be seen that the entropy of the original image is higher and the compression space is smaller. After FMM quantization, the entropy value is significantly smaller, and there is more room for compression.

This shows that FMM quantization plays a very good role. The higher the modulus, the lower the entropy value.

4.3 Before and After

The comparison between the original and compressed images is as follows:

Original Image Decompress Image FMM modulus Compression Ratio PSNR SSIM
nightshot_iso_1600.raw (21609KB) compressed_data.npz (8651 KB) Y:3; Cb:9; Cr:9 ≈ 60% ≈30.02dB ≈0.951
image-20250110181253164 ![image-20250110181316824](D:\0. Data Transmittion\Image Compression\assets\image-20250110181316824.png)
flower_foveon.raw (10047KB) compressed_data.npz (1481 KB) Y:2; Cb:10; Cr:10 ≈ 85% ≈31.16dB ≈0.985
cathedral.raw(17625KB) compressed_data.npz (5978 KB) Y:1; Cb:10; Cr:10 ≈ 66% ≈30.81dB ≈0.978

From the compression results, we can see that we can flexibly adjust the modulus according to the characteristics of the image. The compression rate is as high as over 60%, and the compressed image is perceived the same by the human eye.

A high Peak Signal-to-Noise Ratio (PSNR) value of >30 dB indicates good image quality, suitable for storage or transmission, and has little effect on the human eye. It indicate that the adopted FMM+RLE method has achieved a good balance between compression efficiency and image quality.

The Structural Similarity Index Measure (SSIM) ranges from [0, 1]. The closer the value is to 1, the more similar the two images are. The SSIM range of these compressed images is basically 0.95 ~ 0.99, which means that the images are almost lossless and the quality after compression is very close to the original image.

5. Conclusion

This paper presents a novel image compression method combining the Flexible Modulus Quantization Method (FMQM) and Run-Length Encoding (RLE). Through a series of experiments, the results show that the proposed FMQM+RLE method is effective in achieving high compression rates while maintaining visual quality, meeting the expectations for image transmission in resource-constrained environments.

The method uses a perceptual coding approach that aligns with human visual sensitivity. It applies lower modulus values to the luminance (Y) channel to preserve critical details and higher modulus values to the chrominance (Cb and Cr) channels to achieve efficient compression. The experimental results validate the feasibility of this approach. Using RGB-8 images provided by Rawzor, the compression rate exceeded 50% in all cases, and both PSNR and SSIM metrics confirmed high-quality reconstruction. For example: (i)With the flower_foveon image, the compressed size was 1481 KB, achieving an 85% compression rate with a PSNR of 31.16 dB and an SSIM of 0.985. The estimated transmission time at a 10 Mbps bandwidth was approximately 1.28 seconds. (ii)For the cathedral image, a compression rate of 55% was achieved without affecting visual perception with a PSNR of 30.03 dB and an SSIM of 0.975.

The study further highlights that for brighter images (e.g., flower_foveon), higher modulus values can be used across all channels without significant visual quality loss. Conversely, for images with dark areas and strong contrast (e.g., cathedral), a smaller modulus for the luminance channel is necessary to maintain visual fidelity.

In addition, the proposed method met the real-time transmission constraints of a 10 Mbps bandwidth and a maximum transmission time of 2 seconds. This demonstrates its suitability for real-time applications, such as drone-based image transmission, while satisfying all quality and efficiency requirements. These results confirm that the FMQM+RLE method is a flexible and practical solution for image compression in resource-limited scenarios.


Functional Modules

Image Reading Tool

1. Read .ppm

  • Function: read_ppm

    • Reads PPM files and returns reconstructed image arrays.
  • Parameters:

    • file_path: String, path to the PPM file.
  • Returns:

    • img: 3D NumPy array representing the RGB image, with shape (height, width, 3).
    • magic_number: Identifier of the PPM file type (P6 for binary RGB, P5 for grayscale).
    • channels: Number of color channels (3 for RGB, 1 for grayscale).
    • width: Width of the image in pixels.
    • height: Height of the image in pixels.
    • max_color: Maximum color value (usually 255 for 8-bit images).
    • binary_start: Position in the file where binary pixel data begins.
  • Results:

    I read a .ppm dataset and printed its contents. Results are as follows:

    .ppm Data First Pixel RGB = (5,13,26)
    image-20250108180918838 image-20250108180918838

    This confirms that the original .ppm image is stored in Band Interleaved by Pixel (BIP) format, where all RGB values of each pixel are stored sequentially, e.g., Pixel1 [R, G, B] → Pixel2 [R, G, B] → Pixel3 [R, G, B].

2. Extract raw data

  • Function: extract_raw_data

    Extracts the binary pixel data from a PPM file and saves it as a raw binary file.

  • Parameters:

    • file_path: Path to the PPM file.
    • binary_start: Position in the file where binary pixel data begins (returned by read_ppm).
    • output_path: Path to save the extracted raw binary data.
  • Steps:

    1. Open the PPM file in binary read mode.
    2. Move the file pointer to the position of binary_start to skip the header.
    3. Read the binary pixel data.
    4. Save the pixel data to the specified output path as a raw file.

YCbCr and RGB Conversion

1. RGB to YCbCr Conversion

  • Function: rgb_to_ycbcr

    Converts an RGB image to YCbCr color space.

  • Parameters:

    • img: numpy.ndarray, input RGB image.
  • Returns:

    • Y: Luminance channel (numpy.ndarray).
    • Cb: Blue chrominance channel (numpy.ndarray).
    • Cr: Red chrominance channel (numpy.ndarray).
  • Steps:

    • Use Pillow to convert RGB to YCbCr.
    • Convert YCbCr to NumPy array.
    • Separate and return Y, Cb, Cr channels.

2. YCbCr to RGB Conversion

  • Function: ycbcr_to_rgb

    Restores an RGB image from YCbCr.

  • Parameters:

    • Y, Cb, Cr: Channels as numpy.ndarray.
  • Returns:

    • rgb_img: Reconstructed RGB image (numpy.ndarray).
  • Steps:

    • Stack Y, Cb, Cr into a 3D array.
    • Convert the stacked array to YCbCr using Pillow.
    • Convert YCbCr back to RGB.

3. Test Cases

This is just a utility, but I included a test case in the main function.

  • RGB to YCbCr:
    • Read the original image ../../dataset/rgb8bit/nightshot_iso_1600.ppm.
    • Use rgb_to_ycbcr to separate the image into Y, Cb, and Cr channels.
    • Save the Y, Cb, and Cr channels individually in the ../data/ycbcr directory.
  • YCbCr to RGB:
    • Use ycbcr_to_rgb with the Y, Cb, and Cr channel data.
    • Restore the RGB image and save it to ../data/ycbcr/recover_rgb_pillow.png.

FMM Quantization Tool

1. Module Overview

  • Function: fmm_quantization
    • Performs finite modulus quantization on a given channel.
  • Parameters:
    • channel: numpy.ndarray, representing one channel of an image.
    • modulus: Integer, the modulus used for FMM quantization.
  • Returns:
    • Quantized channel data (numpy.ndarray).
  • Steps:
    1. Adjust values using rounding, then multiply by the modulus to get quantized results.
    2. Return the quantized data.

2. Test Case

This is another utility, with a test case provided in the main function.

  • Use read_ppm to read a PPM image from a specified path.
  • Use rgb_to_ycbcr to convert the RGB image to Y, Cb, and Cr channels.
  • Apply FMM quantization to each channel, adjusting the modulus to achieve the desired effect.
  • Save the quantized Y, Cb, and Cr channels as PNG images.
  • Use ycbcr_to_rgb to restore the quantized YCbCr` data to an RGB image and save it.

Compression and Decompression

Compression Implementation

1. Run-Length Encoding (RLE)

  • Function: rle_compress
    • Compresses a 2D image channel using RLE (Run-Length Encoding).
  • Parameters:
    • channel: numpy.ndarray, input image channel (2D array).
  • Returns:
    • compressed: List of tuples (value, count), representing the pixel value and its consecutive occurrence count.
  • Steps:
    • Flatten the 2D image channel into a 1D array.
    • Iterate through pixel values, recording consecutive repeated values and their counts.
    • Store the compressed results in a list and return it.

2. Saving Compressed Data

  • Function: save_compressed_data_npz
    • Saves the compressed channel data and image dimensions as a binary .npz file.
  • Parameters:
    • Y_compressed: Compressed Y channel data.
    • Cb_compressed: Compressed Cb channel data.
    • Cr_compressed: Compressed Cr channel data.
    • Y_shape: Original dimensions of the Y channel.
    • filename: String, the name of the output file.
  • Steps:
    • Use numpy.savez_compressed to save the compressed data and image dimensions.
    • Generate a binary .npz file for later decompression and restoration.

3. Code Execution

To execute the compression, modify the image path in the main function to call the modules and generate a compressed binary file.

  1. Use read_ppm to read an RGB image from ../dataset/rgb8bit/nightshot_iso_1600.ppm.
  2. Use rgb_to_ycbcr to separate the image into Y, Cb, and Cr channels.
  3. Perform FMM quantization on each channel with modulus 4 for Y and modulus 7 for Cb and Cr.
  4. Compress the quantized Y, Cb, and Cr channels using RLE.
  5. Save the compressed data and image dimensions to data/compressed_data.npz using save_compressed_data_npz.

Decompression Implementation

1. Decoding Decompression

  • Function: rle_decompress
    • Decompresses RLE data to restore 2D image data.
  • Parameters:
    • compressed_data: List of RLE-compressed (value, count) tuples.
    • shape: Tuple representing the shape of the decompressed image (e.g., (height, width)).
  • Returns:
    • decompressed: numpy.ndarray, the decompressed 2D image data.
  • Steps:
    • Initialize a zero-filled array with the specified shape.
    • Iterate over compressed_data, filling the array with each value according to its count.
    • Return the decompressed image data.

2. Restoring Image

  • Function: restore_image_from_npz
    • Restores the original RGB image from a compressed .npz file.
  • Parameters:
    • npz_filename: String, path to the input .npz file.
  • Steps:
    • Load compressed data: Extract compressed Y, Cb, Cr data and image shape from the .npz file.
    • Decompress: Use rle_decompress to restore Y, Cb, and Cr channels.
    • Convert YCbCr to RGB: Use ycbcr_to_rgb to transform the decompressed data back to an RGB image.
    • Save the restored image as a common PNG format.

3. Code Execution

Modify the parameter in the main function to point to the path of the compressed binary file when calling restore_image_from_npz. This will decompress the image and restore it to its original form.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages