FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring

Geunhyuk Youk 1     Jihyong Oh† 2     Munchurl Kim† 1
Co-corresponding authors
1Korea Advanced Institute of Science and Technology, South Korea       
2Chung-Ang University, South Korea
CVPR 2024 Oral

Demo

Abstract

We present a joint learning scheme of video super-resolution and deblurring, called VSRDB, to restore clean high-resolution (HR) videos from blurry low-resolution (LR) ones. This joint restoration problem has drawn much less attention compared to single restoration problems.

We propose a novel flow-guided dynamic filtering (FGDF) and iterative feature refinement with multi-attention (FRMA), which constitutes our VSRDB framework, denoted as FMA-Net. Specifically, our proposed FGDF enables precise estimation of both spatio-temporally-variant degradation and restoration kernels that are aware of motion trajectories through sophisticated motion representation learning. Compared to conventional dynamic filtering, the FGDF enables the FMA-Net to effectively handle large motions into the VSRDB.

Additionally, the stacked FRMA blocks trained with our novel temporal anchor (TA) loss, which temporally anchors and sharpens features, refine features in a course-to-fine manner through iterative updates. Extensive experiments demonstrate the superiority of the proposed FMA-Net over state-of-the-art methods in terms of both quantitative and qualitative quality.

Video

Network Architecture

DDNeRF_Architecture_v21

Top: The architecture of FMA-Net for joint video super-resolution and deblurring (VSRDB).
Bottom: (a) Structure of i+1-th FRMA block; (b) Concept of our flow-guided dynamic filtering.

Quantitative Results

DDNeRF_Architecture_v21

Quantitative comparison on REDS4 for ×4 VSRDB. All results are calculated on the RGB channel. Red and blue colors indicate the best and second-best performance, respectively. The superscript * indicates that the model is retrained on the REDS training dataset for VSRDB.

Qualitative Results

DDNeRF_Architecture_v21

Visual comparison results of different methods for ×4 VSRDB.

Acknowledgement

This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT): No. 2021-0-00087, Development of high-quality conversion technology for SD/HD low-quality media and No. RS2022-00144444, Deep Learning Based Visual Representational Learning and Rendering of Static and Dynamic Scenes.

BibTeX

@InProceedings{Youk_2024_CVPR,
    author    = {Youk, Geunhyuk and Oh, Jihyong and Kim, Munchurl},
    title     = {FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {44-55}
}