25–26 Jun 2026
''Vasil Levski'' National Military University
Europe/Sofia timezone

COMPARATIVE ANALYSIS OF DEEP LEARNING ARCHITECTURES FOR MONOCHROMATIC IMAGE COLORIZATION

Not scheduled
20m
''Vasil Levski'' National Military University

''Vasil Levski'' National Military University

Veliko Tarnovo, Bulgaria
Paper – Oral Presentation Information Technology

Speaker

Ihor Panasenko

Description

Colorizing monochromatic images is crucial for enhancing the informativeness of visual data in modern information technology and computer vision systems. However, automatic colorization poses an inherently ill-posed mathematical challenge due to the multimodal nature of color distributions, where multiple valid color mappings exist for any given grayscale input. This article conducts a comparative analysis of classical algorithms (such as the Welch and Levin methods) against advanced deep learning pipelines. The evaluated architectures range from baseline convolutional neural networks (CNNs) and standalone U-Net models to generative adversarial networks (GANs) and a novel hybrid Fusion GAN that integrates global semantic priors extracted via a ResNet-18 backbone.

All models were trained and rigorously evaluated in the CIE Lab* color space, which effectively separates luminance from chrominance. To ensure a robust evaluation, experiments were conducted using a diverse benchmark of 2,000 images from the COCO dataset. Quality assessment combined traditional metrics like Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index

(SSIM) with the Learned Perceptual Image Patch Similarity (LPIPS) metric, prioritizing human-like visual fidelity.

Experimental results yielded counterintuitive insights. While the proposed Fusion GAN successfully surpassed classical methods, baseline CNNs, and standard GANs across most benchmarks, the standalone U-Net architecture secured the highest overall ranking. Specifically, U-Net achieved the top SSIM score of 0.945 (indicating superior structural preservation), the lowest LPIPS of 0.180 (best perceptual quality), and the fastest inference speed, enabling real-time applications. These findings challenge prevailing assumptions about model complexity, demonstrating that simpler encoder-decoder designs can outperform more intricate generative models in quantitative image restoration tasks. Ultimately, this study underscores the critical need for multifaceted evaluation frameworks in deep learning research.

Keywords: computer vision, image colorization, deep learning, GANs, U-Net, CNNs, LPIPS, SSIM, PSNR, COCO dataset

Author

Co-authors

Presentation materials

There are no materials yet.