Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer

Abstract

This paper firstly presents old photo modernization using multiple references by performing stylization and enhancement in a unified manner.

In order to modernize old photos, we propose a novel multi-reference-based old photo modernization (MROPM) framework consisting of a network MROPM-Net and a novel synthetic data generation scheme. MROPM-Net stylizes old photos using multiple references via photorealistic style transfer (PST) and further enhances the results to produce modern-looking images. Meanwhile, the synthetic data generation scheme trains the network to effectively utilize multiple references to perform modernization.

To evaluate the performance, we propose a new old photos benchmark dataset (CHD) consisting of diverse natural indoor and outdoor scenes. Extensive experiments show that the proposed method outperforms other baselines in performing modernization on real old photos, even though no old photos were used during training. Moreover, our method can appropriately select styles from multiple references for each semantic region in the old photo to further improve the modernization performance.

Key Ideas

Our framework named as Multi-Reference-based Old Photo Modernization (MROPM) consists of two key ideas:

MROPM-Net to modernize old color photos by changing their styles and enhancing them to look modern
Synthetic data generation scheme to train the network in a self-supervised manner

MROPMNet

Our network consists of two different subnetworks to modernize old photos. 1) A shared single stylization subnet to stylize old photo (content) given a single reference (style). 2) A merging-refinement subnet to merge multiple stylization results depending on the semantic similarity and further refine the results. Note: All examples in the figure use two references. However, our framework can use more than two references.

Single Stylization Subnet

In photorealistic style transfer, one can transfer the style of a photo (reference) to another photo (old photo) using AdaIN. AdaIN computes the style code (channel-wise mean and standard deviation) and uses it to align the mean and std of the target image. Instead of computing style code, our core idea is to predict it. First, we compute local & global style codes using local filter and global pooling respectively. Then, we align the local style code using non-local attention between deep features of the old photo and reference. Finally, we fuse both aligned local and global style codes and use the fused style code to align the mean and std of the content image (old photo in this image) via AdaIN.

Merging-Refinement Subnet

The idea of merging-refinement subnet is to select the most appropriate styles from multiple stylized features for each semantic region in the old photo. To achieve this, we employ spatial attention to spatially strengthen and dampen semantically related and unrelated stylized features respectively depending on the correlation matrices. Furthermore, we refine the merging results using U-Net to produce the final modernization result.

Synthetic Data Generation Scheme

The next key idea of our method is a synthetic data generation scheme. Since there is no ground truth in the old photo modernization task, we propose a data generation scheme to train the network in a self-supervised manner. The core idea is to use transformation with style-invariant (SIT) and variant (SVT) properties, determined by whether the transformation affects the mean and std of any semantic regions. We employ rigid transformation (translation for the region that can be translated, rotation, and flipping) for SIT. While, random color jittering and unstructured degradation, i.e., blur, noise, resizing, and compression artifacts are used for Style Variant Transformation (SVT). Given a dataset that has a dense semantic segmentation mask, we split the mask into two (N) different unique sets. Then, we use these binary masks to obtain masked photos. To obtain the input image (c), we sum the results of different SVT from both masked photos. Meanwhile, to obtain the style images (s1 & s2), we use different SIT and fill the unmasked region with random images from the same dataset (URF).

Demos

Select an image from the dropdown below. All of the images are from our dataset. Important note: references are not shown to prevent copyright issues. For details related to the reference, please see our GitHub code.

Input

OPR [1]	ExColTran [2] + OPR [1]	ReHistoGAN [3] + OPR [1]

MAST [4] + OPR [1]	PCAPST [5] + OPR [1]	Ours

Additional Results

Application: Semantic Photorealistic Style Transfer

Our method can be used for semantic photorealistic style transfer.

Content	Style	MAST [4]	PCAPST [5]	Ours

Application: Real Old Photos in The Wild

Some modernization results from real old photos in the wild. Note: We do not own any old photos and references for this application. All of the images are obtained from the internet with CC licenses.

Old Photo	Reference 1	Reference 2	Ours

Related Work

[1] Wan, Ziyu, et al. "Bringing old photos back to life." proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
[2] Yin, Wang, et al. "Yes," Attention Is All You Need", for Exemplar based Colorization." Proceedings of the 29th ACM International Conference on Multimedia. 2021.
[3] Afifi, Mahmoud, Marcus A. Brubaker, and Michael S. Brown. "Histogan: Controlling colors of gan-generated and real images via color histograms." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
[4] Huo, Jing, et al. "Manifold alignment for semantically aligned style transfer." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[5] Chiu, Tai-Yin, and Danna Gurari. "PCA-based knowledge distillation towards lightweight and content-style balanced photorealistic style transfer models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

BibTeX

@article{gunawan2023modernizing, author = {Gunawan, Agus and Kim, Soo Ye and Sim, Hyeonjun and Lee, Jae-Ho and Kim, Munchurl}, title = {Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages = {12460--12469}, year = {2023}, }